Bootstrapping morphological analysis of gĩkũyũ using unsupervised maximum entropy learning
نویسندگان
چکیده
This paper describes a proof-of-the-principle experiment in which maximum entropy learning is used for the automatic induction of shallow morphological features for the resourcescarce Bantu language of Gı̃kũyũ. This novel approach circumvents the limitations of typical unsupervised morphological induction methods that employ minimum-edit distance metrics to establish morphological similarity between words. The experimental results show that the unsupervised maximum entropy learning approach compares favorably to those of the established AutoMorphology method.
منابع مشابه
Bootstrapping Morphological Analysis of Gı̃kũyũ Using Unsupervised Maximum Entropy Learning
This paper describes a proof-of-the-principle experiment in which maximum entropy learning is used for the automatic induction of shallow morphological features for the resourcescarce Bantu language of Gı̃kũyũ. This novel approach circumvents the limitations of typical unsupervised morphological induction methods that employ minimum-edit distance metrics to establish morphological similarity bet...
متن کاملOn Induction of Morphology Grammars and its Role in Bootstrapping
Different Alignment Based Learning (ABL) algorithms have been proposed for unsupervised grammar induction, e. g. Zaanen (2001) and Déjean (1998), in particular for the induction of syntactic rules. However, ABL seems to be better suited for the induction of morphological rules. In this paper we show how unsupervised hypothesis generation with ABL algorithms can be used to induce a lexicon and m...
متن کاملUnsupervised Induction of Arabic Root and Pattern Lexicons using Machine Learning
We describe an approach to building a morphological analyser of Arabic by inducing a lexicon of root and pattern templates from an unannotated corpus. Using maximum entropy modelling, we capture orthographic features from surface words, and cluster the words based on the similarity of their possible roots or patterns. From these clusters, we extract root and pattern lexicons, which allows us to...
متن کاملFrom Finite-State to Inversion Transductions: Toward Unsupervised Bilingual Grammar Induction
We report a wide range of comparative experiments establishing for the first time contrastive foundations for a completely unsupervised approach to bilingual grammar induction that is cognitively oriented toward early category formation and phrasal chunking in the bootstrapping process up the expressiveness hierarchy from finite-state to linear to inversion transduction grammars. We show a cons...
متن کاملMinimum Conditional Entropy Clustering: A Discriminative Framework for Clustering
In this paper, we introduce an assumption which makes it possible to extend the learning ability of discriminative model to unsupervised setting. We propose an informationtheoretic framework as an implementation of the low-density separation assumption. The proposed framework provides a unified perspective of Maximum Margin Clustering (MMC), Discriminative k -means, Spectral Clustering and Unsu...
متن کامل